-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add the Uri\Rfc3986\Uri class to ext/uri without wither support #18836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
43c1d9f
to
155e070
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursory glance, I see there's some todos wrt memory management as well so maybe I'll wait a bit on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first remarks from just looking at the diff on GitHub.
ext/uri/php_uriparser.c
Outdated
UriUriA *uri, zend_string *uri_str, | ||
UriUriA *normalized_uri, zend_string *normalized_uri_str | ||
) { | ||
uriparser_uris_t *uriparser_uris = emalloc(sizeof(uriparser_uris_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uriparser_uris_t *uriparser_uris = emalloc(sizeof(uriparser_uris_t)); | |
uriparser_uris_t *uriparser_uris = emalloc(sizeof(*uriparser_uris)); |
Prevents the two types from going out of sync (same elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I'm only noticing now: I think this should rather be called ext/uri/parser_rfc3986.c
and php_lexbor.c
would be ext/uri/parser_whatwg.c
. The php_uriparser.c
name is confusing to me, because uriparser
is an extremely generic term.
To give another comparison with ext/random, since it is architecturally similar: Each engine has its own engine_enginename.c
file, e.g. engine_xoshiro256starstar.c
.
efree(uriparser_uris); | ||
} | ||
|
||
const uri_handler_t uriparser_uri_handler = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And similarly the public symbols in the file should also be prefixed with php_uri_
. Just uriparser_
can conflict with the uriparser library itself.
Here I would suggest php_uri_rfc3986_handler
.
But I'm also happy to leave this to a follow-up to not introduce too much churn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this and the file renaming is something that I'm happy to do after most parts are merged.
ext/uri/php_uriparser.h
Outdated
typedef struct uriparser_uris_t { | ||
UriUriA *uri; | ||
zend_string *uri_str; | ||
UriUriA *normalized_uri; | ||
zend_string *normalized_uri_str; | ||
} uriparser_uris_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typedef struct uriparser_uris_t { | |
UriUriA *uri; | |
zend_string *uri_str; | |
UriUriA *normalized_uri; | |
zend_string *normalized_uri_str; | |
} uriparser_uris_t; | |
typedef struct uriparser_uris_t { | |
UriUriA uri; | |
zend_string *uri_str; | |
UriUriA normalized_uri; | |
zend_string *normalized_uri_str; | |
} uriparser_uris_t; |
From what I see, the uriparser library does not allocate UriUriA
structs itself, but requires you to pass in an OUT-pointer. Thus we can just stack-allocate it / embed it into another struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uri
field is a good candidate indeed, but the normalized_uri
field is initialized on demand (by uriparser_read_uri
). So we would have to track its initialization status by a separate bool field because UriUriA
doesn't provide any indication. Is it still desirable to allocate on stack even if it causes some maintainability degradation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still desirable to allocate on stack even if it causes some maintainability degradation?
I feel that a separate bool would also be simpler on the maintenance, since you would have less allocs / frees to deal with, but use whatever “feels comfortable”.
ext/uri/php_uriparser.c
Outdated
UriUriA *uriparser_uri = emalloc(sizeof(UriUriA)); | ||
|
||
/* uriparser keeps the originally passed in string, while lexbor may allocate a new one. */ | ||
zend_string *original_uri_str = zend_string_init(ZSTR_VAL(uri_str), ZSTR_LEN(uri_str), false); | ||
if (ZSTR_LEN(original_uri_str) == 0 || | ||
uriParseSingleUriExA(uriparser_uri, ZSTR_VAL(original_uri_str), ZSTR_VAL(original_uri_str) + ZSTR_LEN(original_uri_str), NULL) != URI_SUCCESS | ||
) { | ||
efree(uriparser_uri); | ||
zend_string_release_ex(original_uri_str, false); | ||
if (!silent) { | ||
throw_invalid_uri_exception(); | ||
} | ||
|
||
return NULL; | ||
} | ||
|
||
if (uriparser_base_urls == NULL) { | ||
return uriparser_create_uris(uriparser_uri, original_uri_str, NULL, NULL); | ||
} | ||
|
||
UriUriA *uriparser_base_url = uriparser_copy_uri(uriparser_base_urls->uri); | ||
|
||
UriUriA *absolute_uri = emalloc(sizeof(UriUriA)); | ||
|
||
if (uriAddBaseUriExA(absolute_uri, uriparser_uri, uriparser_base_url, URI_RESOLVE_STRICTLY) != URI_SUCCESS) { | ||
zend_string_release_ex(original_uri_str, false); | ||
uriFreeUriMembersA(uriparser_uri); | ||
efree(uriparser_uri); | ||
uriFreeUriMembersA(uriparser_base_url); | ||
efree(uriparser_base_url); | ||
efree(absolute_uri); | ||
|
||
if (!silent) { | ||
throw_invalid_uri_exception(); | ||
} | ||
|
||
return NULL; | ||
} | ||
|
||
/* TODO fix freeing: if the following code runs, then we'll have use-after-free-s because uriparser doesn't | ||
copy the input. If we don't run the following code, then we'll have memory leaks... | ||
uriFreeUriMembersA(uriparser_base_url); | ||
efree(uriparser_base_url); | ||
uriFreeUriMembersA(uriparser_uri); | ||
efree(uriparser_uri); | ||
*/ | ||
|
||
return uriparser_create_uris(absolute_uri, original_uri_str, NULL, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rough untested suggestion. Basically: Keep the UriUriA
object stack-allocated for as long as possible.
UriUriA *uriparser_uri = emalloc(sizeof(UriUriA)); | |
/* uriparser keeps the originally passed in string, while lexbor may allocate a new one. */ | |
zend_string *original_uri_str = zend_string_init(ZSTR_VAL(uri_str), ZSTR_LEN(uri_str), false); | |
if (ZSTR_LEN(original_uri_str) == 0 || | |
uriParseSingleUriExA(uriparser_uri, ZSTR_VAL(original_uri_str), ZSTR_VAL(original_uri_str) + ZSTR_LEN(original_uri_str), NULL) != URI_SUCCESS | |
) { | |
efree(uriparser_uri); | |
zend_string_release_ex(original_uri_str, false); | |
if (!silent) { | |
throw_invalid_uri_exception(); | |
} | |
return NULL; | |
} | |
if (uriparser_base_urls == NULL) { | |
return uriparser_create_uris(uriparser_uri, original_uri_str, NULL, NULL); | |
} | |
UriUriA *uriparser_base_url = uriparser_copy_uri(uriparser_base_urls->uri); | |
UriUriA *absolute_uri = emalloc(sizeof(UriUriA)); | |
if (uriAddBaseUriExA(absolute_uri, uriparser_uri, uriparser_base_url, URI_RESOLVE_STRICTLY) != URI_SUCCESS) { | |
zend_string_release_ex(original_uri_str, false); | |
uriFreeUriMembersA(uriparser_uri); | |
efree(uriparser_uri); | |
uriFreeUriMembersA(uriparser_base_url); | |
efree(uriparser_base_url); | |
efree(absolute_uri); | |
if (!silent) { | |
throw_invalid_uri_exception(); | |
} | |
return NULL; | |
} | |
/* TODO fix freeing: if the following code runs, then we'll have use-after-free-s because uriparser doesn't | |
copy the input. If we don't run the following code, then we'll have memory leaks... | |
uriFreeUriMembersA(uriparser_base_url); | |
efree(uriparser_base_url); | |
uriFreeUriMembersA(uriparser_uri); | |
efree(uriparser_uri); | |
*/ | |
return uriparser_create_uris(absolute_uri, original_uri_str, NULL, NULL); | |
UriUriA uriparser_uri; | |
/* uriparser keeps the originally passed in string, while lexbor may allocate a new one. */ | |
zend_string *original_uri_str = zend_string_init(ZSTR_VAL(uri_str), ZSTR_LEN(uri_str), false); | |
if (ZSTR_LEN(original_uri_str) == 0 || | |
uriParseSingleUriExA(&uriparser_uri, ZSTR_VAL(original_uri_str), ZSTR_VAL(original_uri_str) + ZSTR_LEN(original_uri_str), NULL) != URI_SUCCESS | |
) { | |
zend_string_release_ex(original_uri_str, false); | |
if (!silent) { | |
throw_invalid_uri_exception(); | |
} | |
return NULL; | |
} | |
if (uriparser_base_urls == NULL) { | |
return uriparser_create_uris(&uriparser_uri, original_uri_str, NULL, NULL); | |
} | |
UriUriA absolute_uri; | |
if (uriAddBaseUriExA(&absolute_uri, uriparser_uri, &uriparser_base_urls->uri, URI_RESOLVE_STRICTLY) != URI_SUCCESS) { | |
zend_string_release_ex(original_uri_str, false); | |
uriFreeUriMembersA(uriparser_uri); | |
uriFreeUriMembersA(uriparser_base_url); | |
if (!silent) { | |
throw_invalid_uri_exception(); | |
} | |
return NULL; | |
} | |
/* TODO fix freeing: if the following code runs, then we'll have use-after-free-s because uriparser doesn't | |
copy the input. If we don't run the following code, then we'll have memory leaks... | |
uriFreeUriMembersA(uriparser_base_url); | |
efree(uriparser_base_url); | |
uriFreeUriMembersA(uriparser_uri); | |
efree(uriparser_uri); | |
*/ | |
return uriparser_create_uris(&absolute_uri, original_uri_str, NULL, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the suggestion of allocating on stack. My feeling is that the end result didn't become any simpler, but I guess there will be some performance benefit (I didn't do any measurements yet). I'm currently re-running the same microbenchmarks on a separate test branch that the original PR had (kocsismate#7).
This comment was marked as resolved.
This comment was marked as resolved.
@TimWolla Thanks for the patch, it's awesome! TBH I didn't even think about using the |
@kocsismate When stack allocating the struct the code should become even simpler (and possibly faster, due to fewer pointer indirection). |
0b3cfd8
to
062dc4d
Compare
Co-authored-by: Tim Düsterhus <[email protected]>
There are a few WIP parts but the PR can already be reviewed.
Relates to #14461 and https://wiki.php.net/rfc/url_parsing_api